AITopics

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Asia > Singapore (0.04)
North America > Canada (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting (0.67)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(3 more...)

Neural Information Processing SystemsFeb-18-2026, 03:01:53 GMT

c848b7d3adc08fcd0bf1df3101ba6728-Paper-Conference.pdf

large language model, machine learning, reinforcement learning, (19 more...)

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Neural Information Processing SystemsFeb-16-2026, 17:09:43 GMT

b5c8c1c117618267944b2617add0a766-Paper-Conference.pdf

large language model, machine learning, receptacle, (22 more...)

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Workflow (0.98)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

Neural Information Processing SystemsOct-10-2025, 17:10:35 GMT

Agent Planning with World Knowledge Model

agent model, knowledge, trajectory, (15 more...)

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Asia > Singapore (0.04)
North America > Canada (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting (0.67)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(3 more...)

Neural Information Processing SystemsOct-10-2025, 16:29:55 GMT

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios.

arxiv preprint arxiv, cabinet 1, cot reasoning, (14 more...)

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 05:31:55 GMT

b5c8c1c117618267944b2617add0a766-Paper-Conference.pdf

large language model, machine learning, receptacle, (22 more...)

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Workflow (0.98)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

arXiv.org Artificial IntelligenceOct-8-2025

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

Si, Shuzheng, Zhao, Haozhe, Luo, Kangyang, Chen, Gang, Qi, Fanchao, Zhang, Minjia, Chang, Baobao, Sun, Maosong

Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.

large language model, machine learning, natural language, (20 more...)

2510.05608

Country:

North America > United States (1.00)
Asia (1.00)

Genre:

Workflow (1.00)
Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

arXiv.org Artificial IntelligenceSep-30-2025

ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection

Kim, Jeonghye, Rhee, Sojeong, Kim, Minbeom, Kim, Dohyung, Lee, Sangmook, Sung, Youngchul, Jung, Kyomin

Recent advances in LLM agents have largely built on reasoning backbones like ReAct, which interleave thought and action in complex environments. However, ReAct often produces ungrounded or incoherent reasoning steps, leading to misalignment between the agent's actual state and goal. Our analysis finds that this stems from ReAct's inability to maintain consistent internal beliefs and goal alignment, causing compounding errors and hallucinations. To address this, we introduce ReflAct, a novel backbone that shifts reasoning from merely planning next actions to continuously reflecting on the agent's state relative to its goal. By explicitly grounding decisions in states and enforcing ongoing goal alignment, ReflAct dramatically improves strategic reliability. This design delivers substantial empirical gains: ReflAct surpasses ReAct by 27.7% on average, achieving a 93.3% success rate in ALFWorld. Notably, ReflAct even outperforms ReAct with added enhancement modules (e.g., Reflexion, WKM), showing that strengthening the core reasoning backbone is key to reliable agent performance.

large language model, machine learning, natural language, (21 more...)

2505.15182

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.94)
Materials > Metals & Mining (0.46)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

arXiv.org Artificial IntelligenceSep-11-2025

TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making

Jiao, Kechen, Fang, Zhirui, Liu, Jiahao, Li, Bei, Wang, Qifan, Liu, Xinyu, Ruan, Junhao, Qiao, Zhongjian, Zhu, Yifan, Xu, Yaxin, Wang, Jingang, Li, Xiu

Using effective generalization capabilities of vision language models (VLMs) in context-specific dynamic tasks for embodied artificial intelligence remains a significant challenge. Although supervised fine-tuned models can better align with the real physical world, they still exhibit sluggish responses and hallucination issues in dynamically changing environments, necessitating further alignment. Existing post-SFT methods, reliant on reinforcement learning and chain-of-thought (CoT) approaches, are constrained by sparse rewards and action-only optimization, resulting in low sample efficiency, poor consistency, and model degradation. To address these issues, this paper proposes Thought-Centric Preference Optimization (TCPO) for effective embodied decision-making. Specifically, TCPO introduces a stepwise preference-based optimization approach, transforming sparse reward signals into richer step sample pairs. It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation. Moreover, by incorporating Action Policy Consistency Constraint (APC), it further imposes consistency constraints on the model output. Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM and validating the effectiveness of our approach in mitigating model degradation after fine-tuning. These results highlight the potential of integrating preference-based learning techniques with CoT processes to enhance the decision-making capabilities of vision-language models in embodied agents.

large language model, machine learning, natural language, (15 more...)

2509.085

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

arXiv.org Artificial IntelligenceMar-9-2025

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Kang, Haoqiang, Sachdeva, Enna, Gupta, Piyush, Bae, Sangjae, Lee, Kwonjoon

Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while PPO focuses on maximizing cumulative rewards. These limitations often restrict solution diversity and hinder generalization in multi-step reasoning tasks. To address these challenges, we introduce a novel framework, GFlowVLM, a framework that fine-tune VLMs using Generative Flow Networks (GFlowNets) to promote generation of diverse solutions for complex reasoning tasks. GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. It takes observations and task descriptions as inputs to prompt chain-of-thought (CoT) reasoning which subsequently guides action selection. We use task based rewards to fine-tune VLM with GFlowNets. This approach enables VLMs to outperform prior fine-tuning methods, including SFT and RL. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld), showing enhanced training efficiency, solution diversity, and stronger generalization capabilities across both in-distribution and out-of-distribution scenarios.

cabinet 1, gflownet, trajectory, (14 more...)